-
Notifications
You must be signed in to change notification settings - Fork 114
Make telemetry batch size configurable and add time-based flush #622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Jesse Whitehouse <[email protected]>
Advise developers to use Python 3.7, 3.8, or 3.9 until #26 is fixed.
Includes a GitHub Action which checks for a valid sign-off on every proposed commit
* Isolate delay bounding logic * Move error details scope up one-level. * Retry GetOperationStatus if an OSError was raised during execution. Add retry_delay_default to use in this case. * Log when a request is retried due to an OSError. Emit warnings for unexpected OSError codes * Update docstring for make_request * Nit: unit tests show the .warn message is deprecated. DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
* Test with multiple python versions. * Update pyarrow to version 9.0.0 to address issue in relation to python 3.10 & a specific version of numpy being pulled in by pyarrow. Closes #26 Signed-off-by: David Black <[email protected]>
* Update changelog and bump to v2.0.4 * Specifically thank @dbaxa for this change. Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
* Add test: cursors are closed when connection closes Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Moe Derakhshani <[email protected]>
Signed-off-by: Moe Derakhshani <[email protected]>
Signed-off-by: Moe Derakhshani <[email protected]> my [OAuth PR](https://github.com/databricks/databricks-sql-python/runs/8005844758?check_suite_focus=true) is blocked due to dco validation (following error): <img width="1202" alt="Screen Shot 2022-08-25 at 12 05 40 PM" src="https://user-images.githubusercontent.com/22279672/186747897-c9d57586-366f-41f9-aa66-609f2bf3911f.png"> We should try to avoid running dco for internal databricks employees: I am trying to relax the validation based on this guideline: https://github.com/dcoapp/app/blob/main/README.md#skipping-sign-off-for-organization-members and here: https://stackoverflow.com/questions/62969381/is-it-in-line-with-the-dco-that-a-github-sign-off-needs-and-publishes-full-name
Signed-off-by: Moe Derakhshani <[email protected]>
Signed-off-by: Moe Derakhshani <[email protected]>
Signed-off-by: Moe Derakhshani <[email protected]> this is undo of #42 till we figure out how to fix dco
Signed-off-by: Jesse Whitehouse <[email protected]>
This PR: * Adds the foundation for OAuth against Databricks account on AWS with BYOIDP. * It copies one internal module that Steve Weis @sweisdb wrote for Databricks CLI (oauth.py). Once ecosystem-dev team (Serge, Pieter) build a python sdk core we will move this code to their repo as a dependency. * the PR provides authenticators with visitor pattern format for stamping auth-token which later is intended to be moved to the repo owned by Serge @nfx and and Pieter @pietern
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Bump to v2.1.0 and update changelog Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Mohit Singla <[email protected]> Co-authored-by: Moe Derakhshani <[email protected]>
* Refactor so we can unit test `inject_parameters` * Add unit tests for inject_parameters * Remove inaccurate comment. Per #51, spark sql does not support escaping a single quote with a second single quote. * Closes #51 and adds unit tests plus the integration test provided in #56 Signed-off-by: Jesse Whitehouse <[email protected]> Co-authored-by: Courtney Holcomb (@courtneyholcomb) Co-authored-by: @mcannamela
Addresses https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13949 Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Sai Shree Pradhan <[email protected]>
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Signed-off-by: Sai Shree Pradhan <[email protected]>
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
@@ -141,10 +141,13 @@ def __init__( | |||
auth_provider, | |||
host_url, | |||
executor, | |||
batch_size=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we put the default here instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting the default in TelemetryClient.__init__
failed because if a user doesn't provide the telemetry_batch_size
argument to Connection
, kwargs.get()
passes None
down the call chain, overriding any default in the TelemetryClient
constructor. Have changed it to set default in Connection.__init__
, ensuring TelemetryClientFactory
and TelemetryClient
always receive an explicit integer value.
Signed-off-by: Sai Shree Pradhan <[email protected]>
Signed-off-by: Sai Shree Pradhan <[email protected]>
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
def _start_flush_thread(cls): | ||
"""Start the shared background thread for periodic flushing of all clients""" | ||
cls._flush_event.clear() | ||
cls._flush_thread = threading.Thread(target=cls._flush_worker, daemon=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why a daemon thred?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A daemon thread is used for the background telemetry flusher so it doesn't prevent the main application from exiting. Without it, Python would wait for the thread's long sleep interval to finish before the program can shut down, making the user's script appear to hang. This ensures the main program exits immediately as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern here:
Daemon threads can be killed mid-operation when the main program exits, which might lead to data loss or corruption.
which means that this might not perform the flush if the main program is killed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is already handled by the exception_hook
. The function _handle_unhandled_exception
flushes all the Telemetry clients.
Signed-off-by: Sai Shree Pradhan <[email protected]>
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Signed-off-by: Sai Shree Pradhan <[email protected]>
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
What type of PR is this?
Description
The flush timer is centralized in
TelemetryClientFactory
, single background thread to manage all connections. Keeping it inTelemetryClient
would mean creating a timer thread per connection.Used
threading.Thread
withthreading.Event
. Thethreading.Event
acts as a thread-safe shutdown signal, and itswait(timeout)
method allows the thread to wait for the next flush interval while remaining immediately responsive to a shutdown command.While
threading.Timer
could be used, it would create a new thread every flush interval as we need to create a timer after each execution.How is this tested?
Related Tickets & Documents
design doc
PECOBLR-654